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Turkey coronavirus (TCoV), one of the least characterized of all known coronaviruses, was isolated from 
an outbreak of acute enteritis in young turkeys in Ontario, Canada, and the full-length genomic sequence 
was determined. The full-length genome was 27,632 nucleotides plus the 3 ' poly(A) tail. Two open reading 
frames, ORFs la and lb, resided in the first two thirds of the genome, and nine additional downstream 
ORFs were identified. A gene for hemagglutinin-esterase was absent in TCoV. The region between the 
membrane (M) and nucleocapsid (N) protein genes contained three potential small ORFs: ORF-X, a previ¬ 
ously uncharacterized ORF with an associated putative TRS within the M gene (apparently shared among 
all group III coronaviruses), and previously described ORFs 5a and 5b. The TCoV genome is organized as 
follows: 5' UTR- replicase (ORFs la, lb) - spike (S) protein - ORF3 (ORFs 3a, 3b) - small envelop (E or 3c) 
protein - membrane (M) protein - ORF5 (ORFs X, 5a, 5b) - nucleocapsid (N) protein -3' UTR - poly(A). 
TCoV genome structure and sequence was most similar, but distinct from, avian infectious bronchitis virus 
(IBV). This is the first complete genome sequence for a TCoV and confirms that TCoV belongs to group III 
coronaviruses. 

© 2008 Elsevier B.V. All rights reserved. 


1. Introduction 

Turkey coronavirus (TCoV) is associated with highly contagious 
gastroenteritis in young poults. First identified in 1951 (Peterson 
and Hymass, 1951 ), TCoV causes high morbidity, some mortality, 
and poor long-term growth of the affected birds, resulting in sig¬ 
nificant economic losses in the turkey industry. Outbreaks have 
been reported in different areas in the USA including Minnesota, 
North Carolina, and Indiana as well as in Quebec, Canada (Dea et 
al., 1986). Turkey coronavirus was subsequently determined to be 
the causative agent of blue comb disease during an investigation of 
an outbreak of acute, highly contagious enteritis in a flock of young 
turkeys (Nagaraja and Pomeroy, 1997). In addition to turkeys, TCoV 
can infect a variety of avian hosts including chickens, pheasants, sea 
gulls, and quail (Deshmukh and Pomeroy, 1974). Recently, TCoV was 
incriminated as one of the most important causative agents of poult 
enteritis and mortality syndrome (Barnes and Guy, 1997; Teixeira 
et al, 2007). 

Coronaviruses are divided into three groups (I, II, and III) based 
on the genome structure and organization (Holmes and Lai, 1996). 
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The coronavirus genome is a single-strand positive-sense RNA 
with a 5 ' cap and poly (A) tail at the 3 ' end. Two large open 
reading frames (ORFs) occupy the 5' proximal two thirds of the 
genome and are involved in polyprotein processing, genome repli¬ 
cation, and subgenomic RNA synthesis, and the 3 ' one third of 
the genome codes for structural proteins. ORF1 consists of two 
overlapping ORFs (ORFla and ORFlb) that are translated into la 
and la/lb polyproteins by a ribosomal frame-shifting mechanism. 
ORFla encodes 2 proteases; papain-like cysteine protease (PLP) 
and picornavirus 3C-like chymotrypsin protease (3CLP). Both pro¬ 
teases cleave the polyproteins into at least 16 cleavage products 
(Sawicki et al., 2007). An intergenic consensus sequence of about 
seven bases is found immediately upstream of each gene, which 
plays an important role for subgenomic RNA synthesis (Sawicki 
et al., 2007). Most coronaviruses code for four major structural 
proteins; spike (S) glycoprotein, membrane (M) protein, small 
envelope (E) protein, and nucleocapsid (N) proteins, in addition 
to hemagglutinin esterase (HE) glycoprotein in some group II 
coronaviruses. 

The genome structure and organization are not known for any 
TCoV, and this lack of genetic information has made it difficult to 
develop effective detection and control measures for TCoV. The goal 
of the present study was to complete the full genomic sequence 
of TCoV and establish a better understanding of the virus at the 
molecular level. 
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2. Materials and methods 

2 A. Virus 

Clinical specimens consisting of intestinal tracts were collected 
from turkey poults during an outbreak of diarrhea in a turkey farm 
in Ontario, Canada. The tissue samples were processed by washing 
several times with phosphate buffer saline (PBS) and cut into small 
pieces prior to grinding using a sterile mortar and pestle. Coarse 
particles and tissue debris were removed by centrifugation at 
3000 x g for 30 min at 4 °C. The supernatants were filtered through 
0.22-|jim filters (Millipore, Bedford, MA). Virus was isolated from 
the filtrates by inoculating embryonated turkey eggs. Allantoic 
fluids were collected 3 days post-inoculation then clarified by cen¬ 
trifugation 3000 xg for 30 min at 4°C and the supernatant were 
filtered through 0.22-p,m filters and stored at -80 °C until use. 

2.2. RNA extraction, RT-PCR and PCR 

Complementary DNA (cDNA) was synthesized using 10.5 |xl of 
the first strand mixture (Invitrogen, Carlsbad, CA) containing 0.2 p,g 
of random hexamers and 2 p>g of total RNA isolated from the intesti¬ 
nal specimens. The mixture was incubated at 70 °C for 10 min and 
then quick-chilled on ice for 5 min. RT master mix was composed of 
4 |ixl 5 x RT buffer (Invitrogen), 2 |ixl 10 mM DTT, 2 jxl 10 mM dNTPs 
(Amersham, Piscataway, NJ), 1 |xl Superscript II reverse transcrip¬ 
tase (Invitrogen), and 0.5 pi RNAse inhibitors (Invitrogen). This RT 
master mix was added to 10.5 pi of the first strand mixture and then 
incubated at 42 °C for 2 h. The reaction was terminated by heating 
at 95 °C for 10 min then chilling on ice for 5 min. 

Fifty microliters PCR reactions included 2 pi of cDNA and 48 pi 
of the master mix composed of 5 pi 10 x buffer (100 mM Tris-HCl, 
500 mM KC1), 1.5 pi of 15 mM MgCl 2 ,1.5 pi of 10 mM dNTPs, 2 pmol 
of upstream and downstream primers, two units of Taq High 
Fidelity DNA polymerase (Invitrogen), two units of Amplitaq gold 
(Roche Molecular Systems, Inc., USA) plus 38 pi nuclease free water. 
PCR was performed for 35 cycles as follow; 95 °C for 1 min, 65 °C for 
1 min, 55 °C for 30 s, 72 °C for 3 min, followed by final extension at 
72 °C for 10 min. The PCR products were analyzed by 1% agarose gel 
electrophoreses and visualized by staining with ethidium bromide 
and UV illumination. 

2.3. DNA cloning and sequence analysis 

Thirteen overlapped PCR fragments spanning the entire viral 
genome were amplified using specific primer sets (Table 1). The 
PCR products were purified using the QIA quick PCR purification 
kit (Qiagen, Valencia, CA) and cloned in pGEM-T Easy (Promega, 
Madison, WI). Transformants were screened by restriction enzyme 
digestion and sequencing using primers specific for T7 and SP6 
promoter. The sequences were analyzed using the Sequencher 
4.5 sequence analyses program, and a single contiguous sequence 
comprising the entire TCoV genome was constructed. Prediction 
for ORFs was conducted using Vector NTI Advanced 10 (Invit¬ 
rogen), and the sequences were analyzed using Lasergene DNA 
STAR (version 7, Lasergene Corp, Madison, WI). The pairwise 
nucleotide identity was determined using Vector NTI Advanced 10 
and multiple sequence alignments were generated using Clustal-W 
(Thompson et al., 1997). Comparative analyses of TCoV with other 
coronaviruses were conducted using the Coronavirus Database 
(CoVDB, Huang et al., 2007). 

2.4. The 5' end of the genome 

cDNA clone representing the 5' end of the TCoV-MGlO genome 
had been synthesized according to the 5' RACE System for rapid 


Table 1 

Oligonucleotide primers used for TCoV genome amplification and their positions in 
the genome 



a Primers are designated as forward (F) or reverse (R). 

amplification of cDNA ends (Invitrogen). The antisense primer 
had been designed based on the available TCoV-MGlO sequence 
(S'-CGCCAGGTGTTAl 111GTCA) then cDNA was synthesized as 
mentioned before. The cDNA was purified using Qiagen column 
purification kits. Tailing of the cDNA was done using dCTP and 
dTd. PCR had been done to amplify the dc-tailed cDNA with the 
abridged anchor primer together with the designed primer (5 r - 
GTTGTCACTGTCTATTGTATG) according to the instructions of the 
kits. 

2.5. The 3' end of the genome 

The 3' end of TCoV genome had been done using the 3' race 
system for rapid amplification of cDNA ends (Invitrogen) accord¬ 
ing to the instruction of the kite. cDNA was synthesized using the 
adaptor primer then the cDNA was amplified with PCR using the 
TCoV-MGlO specific primer (5 7 CTATCGCCAGGGAAATGTCT 3') and 
the universal amplification primer according to the instruction of 
the kits. The obtained PCR products had been cloned and sequenced 
in both directions as mentioned earlier. The obtained sequence 
aligned with the rest of the genomic sequence and run through 
the poly(A) tail. 

2.6. TCoV-MGlO phylogenetic relationships 

Phylogenetic relationships among coronaviruses were investi¬ 
gated using the following complete genomes representing groups 
I, II, and III: BCoV-ENT (NC_003045), HCoV-NL63 (DQ445912.1), 
HCoV-OC43 (NC_005147.1), IBV-M41 (AY851295), IBV-p65 
(DQ001339.1), IBV-Cal99 (AY514485), IBV_NC (NC_001451), 
SARSCoV-BJ202 (AY864806.1), HCoV-229E (NC_002645.1), MHV- 
JHM (NC-006852.1) and FIPV WSU-79/1146 (NC_007025.1). Whole 
genome sequences were aligned using ClustalW (Thompson et 
al., 1997) and subsequently optimized by eye using the Geneious 
software (Version 3.0.6, Biomatters Ltd, Auckland, New Zealand) 
(Kumar et al., 2004). Aligned nucleotide sequences were analyzed 
using PAUP (version 4.0), and maximum likelihood (HKY85 model 
with transition/transversion ratio of 2) and maximum parsimony 
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(transversion/transition ratio of 2) search criteria were used with 
branch and bound search strategies (Swofford, 1991). Bootstrap 
supports for the resulting trees were determined using 100 repli¬ 
cate heuristic tree searches in both parsimony and likelihood 
analyses using the same search criteria. For more detailed ingroup 
analysis of relationships among the group III coronaviruses, amino 
acid sequences for the spike glycoprotein (S), envelope protein (E, 
also known as 3c) and the nucleocapsid (N) protein were aligned 
using ClustalW and subsequently analyzed using maximum 
parsimony (branch and bound search algorithm) followed by 
bootstrap analysis using a heuristic search method. The following 
additional group III coronavirus sequences used for the compar¬ 
ative sequence analyses: TCoV-Gh S gene (AY342356), TCoV-GI 
S gene (AY342357), Quail coronavirus Italy/Elvia/2005 SI gene 
(EF446155.1), TCoV-NC95 N gene (AF111997), TCoV-Minnesota N 
gene (AF111996), TCoV-Indiana N gene (AF111995). All ingroup 
translation products (S, E and N proteins) from the whole genomes 
of the same group I and II coronaviruses used above were included 
in these analyses. The group I and II coronaviruses were considered 
functional outgroups for determining relationships among the 
group III coronaviruses. 

2.7. GenBank accession 

The TCoV full genomic sequence described in this report 
was deposited in the GenBank database with accession number 
EU095850. 

3. Results 

3.1. Complete TCoV genomic sequence and organization 

ATCoV was isolated from an outbreak in an Ontario turkey farm 
and designated TCoV-MGlO. Subsequently, the full-length genomic 


sequence of TCoV-MGlO was determined by sequencing of over¬ 
lapping PCR fragments in both directions. The sequences were 
assembled into one contiguous sequence to represent the entire 
viral genome. The sequence of 27,632 nucleotides was obtained, 
plus the polyadenylation tail at the 3' end. The entire genome has a 
GC content of 38.3%. The TCoV genome contained two large slightly 
overlapping ORFs in the 5' two thirds of the genome and multiple 
additional ORFS in the 3' one third of the genome (Fig. 1). Both 
termini were flanked with untranslated regions (UTRs). The TCoV 
genome was similar overall in its coding capacity and genomic 
organization to those of other coronaviruses. Eleven ORFs were 
identified in the genome (Table 2). Gene 1 was 19,806 nucleotides 
comprising ORFla and ORFlb, located between nucleotides 529 
and 20,333. This gene contained motifs common in all coron¬ 
aviruses including ribosomal frameshifting and slippery sequences, 
as ORFlb is translated in the -1 frame. The typical coronavirus 
structural genes encoding the spike (S), small envelope (E, also 
known as 3c), membrane (M), and nucleocapsid (N) proteins were 
identified following Gene 1 (Table 2, Fig. 1). The TCoV genome 
had polycistronic genes 3 and 5 interspersed between the S and 
E genes, and between M and N genes, respectively. In IBV, Gene 3 
is believed to be tricistronic consisting of 3a, 3b and 3c, where 3c 
has been shown to encode the small envelope (E) protein. Gene 
5 contained a coding potential for two products, 5a and 5b, of 
unknown function. A third ORF of 282 nucleotides was located 
between the M gene and Gene 5a. This gene, designated ORF- 
X, contained a coding potential for a protein of 94 amino acids 
that had no structural or sequence homology with any known 
protein. BLAST searches failed to identify any protein homologs, 
but identified highly similar nucleotide sequences from other 
TCoV isolates as well as from numerous IBV isolates. ORF-X had 
a relatively distant but highly conserved putative transcription 
regulatory sequence (TRS). In summary, the genome organization 
for TCoV was determined as follows: 5' UTR-Gene 1 (ORFla, lb)- 
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Fig. 1. Turkey coronavirus (TCoV) MG10 genome organization. The full-length TCoV genome is 27,632 nucleotides excluding the polyadenylation tail at the 3' end. Center: 
diagrammatic representation of the genome organization shows the predicted genes and their relative sizes and positions along the TCoV genome. S, spike glycoprotein gene; 
3a, 3b, and 3c (E), tricistronic gene 3; M, membrane protein gene; ORF-X, unique ORF conserved among group III coronaviruses; 5a and 5b, bicistronic gene 5; N, nucleocapsid 
protein gene; UTR, untranslated region. Scales indicate relative positions of the various genes within the genome in kilobases. Top: expanded representation of the two ORFs 
(ORFla and ORFlb) comprising the polycistronic gene 1 and the likely cleavage products and cleavage sites after proteolytic processing of the la/lb polyprotein. Bottom: 
expanded representation of the S gene indicating the signal peptide (SigP), putative cleavage site (S1/S2), endodomain (clear bars), endodomain (hatched bar) and a short 
transmembrane region (solid bar). 
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Table 2 

The genome organization and predicted viral proteins encoded by TCoV 


Gene 

Frame 

Start 

Stop 


Size (nt) 

Size (aa) 

5' UTR 


1 

528 


528 


ORFla 

+1 

529 

12381 


11853 

3951 

ORFlab 


529 

20333 


19806 

6601 

S 

+2 

20360 

24037 


3678 

1226 

G3a 

+3 

23988 

24158 


171 

57 

G3b 

+2 

24161 

24352 


192 

64 

G3c (E) 

+3 

24336 

24632 


297 

99 

M 

+2 

24637 

25305 


669 

223 

ORF-X 

+1 

25309 

25590 


282 

94 

G5a 

+1 

25669 

25863 


195 

65 

G5b 

+3 

25863 

26108 


246 

82 

N 

+2 

26054 

27280 


1227 

409 

3’ UTR 


27281 

27632 


352 


ORF1 a/lab non-structural proteins for positions and sizes 





Proteins 

Genomic a positions (nt) 

1 a/lab positions (aa) 

C-end cleavage 

Size (aa) 

Identity 13 to IBV (%) 

Possible 0 motif 

NSP1 

529-? 

Ml-? 

? 

? 



NSP2 

?-2547 

?-673G 

AG/GK 

-673 

91.4 

P87 

NSP3 

2548-7323 

G674-2265G 

AG/GI 

1592 

89.1 

PLP 

NSP4 

7324-8865 

G2266-2779Q. 

LQ/AG 

514 

87.4 


NSP5 

8866-9786 

A2780-3086Q 

LQ/SS 

307 

92.8 

3 CLP 

NSP6 

9787-10665 

S3087-3379Q. 

VQ/SK 

293 

92.5 

HD 

NSP7 

10666-10914 

S3380-3462Q 

LQ/SV 

83 

96.4 


NSP8 

10915-11544 

S3463-3672Q 

LQ/NN 

210 

96.2 


NSP9 

11545-11877 

N3673-3783Q. 

LQ/SK 

111 

99.1 


NSP10 

11878-12312 

S3784-3928Q. 

VQ/SV 

145 

95.2 

GFL 

NSP11 

12313-12381 

S3929-3951Q 


23 

91.3 


NSP12 

12313-15131 

S3929-4870Q. 

LQ/SC 

941 

97.2 

RdRp 

NSP13 

15132-16931 

S4871-5470Q. 

LQ/GT 

600 

97.3 

HEL 

NSP14 

16932-18485 

G5471-5988Q. 

LQ/SI 

518 

96.2 

ExonN 

NSP15 

18486-19499 

S5989-6326Q. 

LQ/SA 

338 

95.3 

NendoU 

NSP16 

19500-20333 

S6327-6604S 


278 

95.0 

2-O-MT 


a Not including stop codons. 
b IBV Beaudette (GenBank accession NC_001451). 

c PLP, papain-like protease; HD: hydrophobic domain; 3CLP, 3C-like proteinase; GFL: growth factor-like domain; RdRp: RNA-dependant RNA polymerase; HEL, helicase 
domain; ExonN, exoribonuclease; NendoU, nidoviral uridylate-specific endoribonuclease; 2'0-MT, 2 / -0-ribose methyltransferase. 


S-Gene 3-(ORFs 3a, 3b, E)-M-Gene 5 (ORFs X, 5b, 5c)-N-UTR-3' 
(Fig.l). 

3.2. 5'UTR 

The 5' terminus of the TCoV genome was characterized by the 
presence of a 528 nucleotides long UTR with relatively higher GC 
content of 50.4% compared with the genome as a whole. The TCoV 
5' UTR showed a high degree of sequence similarity with that of 
most IBV isolates such as IBV-NC (98% identity). 

3.3. Gene l —viral replicase 

Following 5' UTR, Gene 1 of 19,806 nucleotides was located. 
Gene 1 included two slightly overlapping ORFs, la and lb. ORFla 
was 11,853 nucleotides in size enabling it to code for a protein of 
3951 amino acids, and ORFlb was 7992 nucleotides in length for a 
coding potential of 2664 amino acids. Those two ORFs overlapped 
by 40 nucleotides. ORF1 b did not contain a typical AUG translational 
initiation codon but instead started with GAA at position 12,342 
(Fig. 2A). The heptanucleotide slippery sequence (UUUAAAC) was 
present which is conserved among all coronaviruses (Fig. 2B), 
and therefore the ribosomal frame-shifting mechanism seemed 
to be applicable to TCoV and ORFlb was believed to be trans¬ 
lated by the ribosomal frame-shifting mechanism as a fusion with 
ORFla to make the polyprotein of la/lb. For most coronaviruses, 
the polyprotein is processed into 16 cleavage products, while the 
IBV polyprotein is likely cleaved into 15 products (Ziebuhr et al., 
2000). As with other coronaviruses, the TCoV polyprotein was also 


believed to undergo proteolytic processing by viral proteinases. The 
TCoV replicase protein was similar to that of IBV in its process¬ 
ing patterns. Since the potential cleavage sites were conserved for 
both viruses (Table 2), the TCoV polyprotein was assumed to be 
processed in the similar manner. Two main proteases used by coro¬ 
naviruses have been identified; PLP (papain-like proteinase) which 
produces 2 or 3 N-terminal products of the polyprotein and 3CLP 
(3-C like protease) which produces the central and the C-terminal 
region of the polyprotein by cleaving 11 sites (Ziebuhr et al., 2000). 
Similar to SARS-CoV, IBV encodes only one PLP, whereas other coro¬ 
naviruses code for two PLPs. In the case of IBV, PLP equivalent 
to PLP2 of other coronaviruses cleaves the polyprotein at the QS 
dipeptide to produce the N-terminal 1001<D proteins, and TCoV PLP 
seemed to function in a similar fashion. TCoV 3CLP was mapped in 
NSP5. Based on the sequence comparisons between TCoV and IBV, 
the replicase cleavage products and their putative functions were 
predicted (Fig. 1, Table 2). 

It has been suggested that NSP1 does not exist in IBV (Ziebuhr et 
al., 2007). Due to the high degree of similarity between TCoV and 
IBV in the N-terminal region of the polyprotein la/lb, TCoV also 
seemed not to contain NSP1. Thus, for TCoV, NSP2 through NSP11 
would be produced from ORF1 a, while NSP12 through NSP16 would 
be produced from ORF1 b. The N-terminal most cleave product NSP2 
was predicted to locate between nucleotide positions 529 and 2549. 
The sequence analysis for NSP1/2 reveals 44%, 55%, and 90% iden¬ 
tity to BCoV, HCoV-229E, and IBV, respectively. PLP1 was suggested 
to have been lost during the IBV virus evolution (Ziebuhr et al., 
2001 ), and our sequence data also supports that TCoV contained 
only PLP2. Therefore, PLP2 would be responsible for cleavage of the 
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(A) 


12340 


12350 


i 


12360 


i 


12370 


3 


12380 


12^9 0 



+ 1 ' 

+ 2 

+ 3 W 

(B) 

Virus 

Group 

Position 

HCOV-229E 

(12503) 

FIPV 

(12162) 

HCOV-NL63 

(12407) 

BCoV 

(13324) 

HCOV-0C43 

(13324) 

MHV-JHM 

(13601) 

IBV-Cal99 

(12340) 

IBV-NC 

(12337) 

IBV-M41 

(12343) 

IBV-p65 

(12334) 

TCoV-MG10 

(12337) 


GGTAAGAATTATTTAAACGGGTACGGGGTAGCAGTGAGGCTCGGCTGATACCCC 
GKNYLNGY GVAVRLG . YP 
V R I I TGTG . Q . GSADTP 

ELFKRVRGSSEARLIP 


Putative ribosomal fra me-shifting sites 

GATACTAATTTTTTAAACGAGTCCGGGGCTCTAGTGCA-GCTCGACTAGAG 

GATACTAATTTTTTAAACGAGTGCGGGGTTCTAGTGCA-GCTCGACTAGAA 


GATACTAATTTTTTAAACGGGTTCGGGTTCGGGTACGAGTGTAGATGCCCGTCT 
GATACTAATTTTTTAAACGGGGTTCGGGTTCGGGTACGAGTGTAGATGCCCGTCT NC_0 05147.1 
GACACGAATTTTTTAAACGGGTTCGGGGTACAAGTGTAAATGCCCGTCTTGTACC NC_0 0 6852.1 

GATAAGAATTATTTAAACGGGTACGGGGTAGCAGTGAG-GCTCGGCTGATACC 

GATAAGAATTATTTAAACGGGTACGGGGTAGCAGTGAG-GCTCGGCTGATACC 

GATAAGAATTATTTAAACGGGTACGGGGTAGCAGTGAG-GCTCGGCTGATACC 

GATAAGAATTATTTAAACGGGTACGGGGTAGCAGTGAG-GCTCGGCTGATACC 

GGTAAGAATTATTTAAACGGGTACGGGGTAGCAGTGAG-GCTCGGCTGATACC 


GenBank No. 


NC-002645.1 

I 

NC_007025.1 

I 

DQ445912.1 

I 

NC_003045 

II 

NC_005147.1 

II 

NC_00 6852.1 

II 

AY514485 

III 

NC_001451 

III 

AY 851295 

III 

DQ001339.1 

III 

EU095850 

III 


Fig. 2. TCoV putative ribosomal frameshifting regions. (A) Region of the TCoV containing the heptanuclotide slippery sequence conserved among coronaviruses and the 
ORFla stop codon. The amino acids in the joining region to produce the la/lb fusion protein are indicated in bold. Numbers indicate the positions of these features in the 
TCoV genome. (B) Putative ribosomal frame-shifting sites in coronaviruses representing groups I, II, and III and ORFla stop codons for the group III viruses are found in 
the shaded regions of this nucleotide alignment. The illustrated motif is highly conserved among Group III coronaviruses. HCoV, human coronavirus; FIPV, feline infectious 
peritonitis virus; BCoV, bovine coronavirus; MHV, murine hepatitis virus; IBV, infectious bronchitis virus; TCoV, turkey coronavirus. 


N-terminal part of the polyprotein at two sites between NSP2/NSP3 
(AG/GK) and NSP3/NSP4 (AG/GI). PLP2 was identified in NSP3. NSP3 
was the largest subunit of the replicase cleavage products and was 
highly conserved among TCoV-MGlO and IBV-Baudette’s strain, as 
both viruses share 89% identity on the nucleotide level. In contrast, 
3CLP would cleave la/lb polyprotein at 11 sites and generate NSP5 
through NSP16 (Table 2). The motif for 3CLP was found in NSP5. The 
la/lb polyprotein contained the motifs found in other nidoviruses 
(Fig. 1, Table 2). An earlier study identified a 871<D (P87) protein 
encoded in this region (Lim and Liu, 1998), and the P87 homolog 
was also found in TCoV-MGlO as a 673 amino acids protein in the 
same region. 

3CLP was located in NSP5 of 307 amino acids. NSP5 was sim¬ 
ilar by 52%, 44%, and 92% at the nucleotide sequence level and 
by 39%, 44%, and 93% at the amino acids level to HCoV-229E, 
BCoV, and IBV, respectively. This region was believed to play a 
critical role for ORFlb processing since the deletion of NSP5 in 
IBV resulted in the unprocessing of lb protein (Liu et al., 1997). 
NSP3, NSP4 and NSP6 were predicted to carry a hydrophobic 
transmembrane domain which may play an important role in the 
transcription/replication process as recently discussed for other 
coronaviruses (Sawicki et al., 2007). NSP10 was rich in cysteine 
and histidine and was predicted to contain a metal binding domain 
as well as NTP binding helicase domain as with other coron¬ 
aviruses (Gorbalenya et al., 1989). NSP11 was predicted to be a 
small peptide of 23 amino acids in length, which is likely the 
C-terminal most cleavage product of la. NSP12 contained the RNA- 
dependant RNA polymerase (RdRp) activity, which would likely 
be involved in the genome replication and transcription (Liu et 
al., 1994). The RdRp motif was highly conserved among all coro¬ 
naviruses, and TCoV-MGlO also showed a high degree of sequence 


identity for RdRp by 64%, 62%, 94% at the amino acid level to 
HCoV-229E, BCoV, IBV, respectively. NSP13 has previously been 
suggested to play a role for genome replication by unwinding 
double-strand RNA (Gorbalenya et al., 1989). NSP14 was assumed 
to possess the ExonN domain. This domain may be associated with 
RNA metabolism such as proofreading ability and recombination. 
Both coronaviruses and toroviruses contain one ExonN motif while 
roniviruses contain two copies of the ExonN motif (Snijder et al., 
2003). NSP15 contained the motif for NendoU activity. In SARS- 
CoV and HCoV-229E, NendoU cleaved the double-strand RNA at 
the uridylate-containing sequence, and this activity was essen¬ 
tial for RNA synthesis and progeny virus production (Ivanov et al., 
2004; Posthuma et al., 2006). NSP16 contained a motif for 2 r O- 
methytransferase (MT) in other coronaviruses (Gorbalenya et al., 
2006), and TCoV also contained NSP16 as the most C-terminal 
cleavage product of lab. TCoV ORFla showed a 46% sequence iden¬ 
tity to both BCoV and HCoV-OC43, which are group II coronaviruses, 
and a 90% identity to IBV-NC, IBV-M41, and IBV-p65 which are group 
III coronaviruses (Table 3). TCoV ORFlb was more conserved than 
ORFla, as TCoV ORFlb showed a 59% identity to groups I and II 
coronaviruses, and 93% identity to IBV which is a group III coron¬ 
avirus. 

3.4. Structural genes 

As with other coronaviruses, TCoV was characterized by the 
presence of 4 major tructural genes, located in the 3' one-third 
of the genome: spike (S) glycoprotein, small envelope (E) protein, 
membrane (M) protein, and nuclocapsid (N) protein genes (Fig. 1). 
The S protein gene was located immediately downstream from 
ORFlb for a predicted protein of 1,226 amino acids. The S gene was 
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Table 3 

Percent (%) nucleotide identity of TCoV MG10 to other coronaviruses 



a FeCoV, feline coronavirus; HCoV, human coronavirus; BCoV, bovine coronavirus; 
MHV, mouse hepatitis virus; SARS-CoV, severe acute respiratory syndrome coron¬ 
avirus; IBV, infectious bronchitits coronavirus; TCoV, turkey coronavirus. 
b n/a: data not available. 

the most variable gene in the TCoV genome as compared to that of 
other coronaviruses. The S gene showed only 40% and 44% identi¬ 
ties to that of group I and group II coronaviruses, respectively, while 
the similarity to different strains of IBV was 57% (Table 3). The TCoV 
S gene showed the highest sequence identity to that from other 
TCoV isolates, TCoV-Gh and TCoV-GI, by 96% for both. The sequence 
variability was mainly due to the hypervariable and the receptor 
binding regions in the S protein. The high degree of sequence iden¬ 
tity for S among TCoV isolates suggests that TCoV is less heterogenic 
as with IBV than other coronaviruses (Cavanagh, 2005). TCoV S was 
slightly larger than IBV S as TCoV S was 3678 nucleotides, while IBV 
S was 3453 nucleotides in size. The coronavirus S protein is respon¬ 
sible for receptor binding and virus-host cell membrane fusion. 
For groups II and III coronaviruses, the S protein is cleaved into two 
subunits; the N-terminal SI product and the C-terminal S2 product. 
TCoV S was also presumed to be cleaved into SI (542 amino acids) 
and S2 (684 amino acids) with the putative cleavage site character¬ 
ized by presence of basic amino acids (Arg-Arg-Ser) between 542 
and 543 (Fig. 1 ) as for IBV S (Cavanagh, 2007). The receptor-binding 
domain in SI is not well-identified for IBV and thus was not possible 
to predict for TCoV. TCoV S was likely highly glycosylated as it con¬ 
tained 24 potential N-linked glycosylation sites. TCoV S contained 
three hydrophobic transmembrane domains; two ectodomains and 
one endo-domain. 

TCoV Gene 3 was thought to be tricistronic (ORF 3a, 3b, and 
3c). The small membrane (E) protein of 99 amino acids was poten¬ 
tially encoded by ORF3c. Both ORF 3b and E genes overlapped by 


16 nucleotides. The TCoV E gene showed a high degree of sequence 
identity to IBV E by 90%. The E protein was reported to be a viroporin 
which played a role in the membrane permeability in SARS-CoV 
and porcine reproductive and respiratory syndrome virus, another 
member of nidoviruses (Wilson et al., 2004; Lee and Yoo, 2006). 

The membrane (M) protein gene was 699 nucleotides in size and 
was able to make a protein of 223 amino acids. The M gene seemed 
to be highly conserved within group III coronaviruses since TCoV M 
showed a 94% nucleotide identity to IBV M. The M protein contained 
a single putative N-linked glycosylation site at amino acid position 
4 as well as 3 potential sites for O-linked glycosylation. It remains 
to be determined whether these sites are functional for TCoV. 

The nucleocapsid (N) protein gene was 1,227 nucleotides with 
a coding capacity of 409 amino acids. The TCoV N gene showed a 
93% sequence identity to that of various IBV strains. The N protein 
was shown to be a serine phosphoprotein in other coronaviruses 
and arteriviruses (Alexander et al., 2005; Wootton et al., 2002). 
The TCoV N protein contained 20 serine residues but no tyrosine 
residue. 

3.5. Gene 3 and Gene 5 

Gene 3 (ORF3) is possibly tricistronic as with other coron¬ 
aviruses. ORF3 was able to code for two non-structural proteins 
and the small envelope (E) protein. ORF3a and 3b were 171 and 192 
nucleotides in length capable of coding for 57 and 64 amino acid 
proteins, respectively. ORF5 is potentially bicistronic to code for 5a 
and 5b proteins. ORF5a was 195 nucleotides in length for 65 amino 
acids while ORF5b had a potential for 82 amino acids. ORFs 5a and 
5b overlapped by three nucleotides, while ORF5b and the down¬ 
stream N gene overlapped by 57 nucleotides. The presence of Gene 
3 and Gene 5 were highly suggestive that TCoV was related to group 
III coronaviruses. A recent study for the role of Gene 3 and Gene 5 
for IBV replication showed that deletion mutant viruses succeeded 
in replication in a similar manner to the wild-type virus, suggesting 
that those genes were non-essential for IBV replication (Casais et al., 
2005). In contrast, ORF5a and ORF5b were not found in mammalian 
coronaviruses, and thus the presence of Gene 5 may be considered 
a characteristic feature of avian coronaviruses including IBV and 
TCoV. 

3.6. TCoVORF-X 

TCoV was characterized by the presence of an additional ORF, 
designated ORF-X. ORF-X was 282 nucleotides in length with 33.3% 
GC contents. This ORF was located upstream of Gene 5 and started 
immediately following the M gene. This gene was able to encode a 
hypothetical protein of 94 amino acids. A blast search for this ORF 
using the amino acid sequence found no homology to described 
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Fig. 3. Illustration of conservation of the ORF-X amino acid sequence and the putative TRS found upstream of the ORF-X gene among a number of group III coronaviruses. 
TCoV and IBV share a highly conserved 94 amino acids hypothetical protein, designated as ORF-X. A highly conserved TRS motif (GTCAACAA) for ORF-X is found 288 nt 
upstream, within the M gene, in all group III coronaviruses. The Beaudette strain of IBV and strains derived from it such as IBV-p65 have a 49 nt deletion in ORF-X but the 
remaining sequence aligns unequivocally with all other group III coronaviruses when this deletion is taken into account, ^/a, no sequence available for this region; 2 single 
nucleotide (A) from position 446 (AF072911.1) was deleted to maintain reading frame; 3 17 amino acid deletion (51 nucleotides for 17 codons) in these sequences. 
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a TRS: Transcription regulating sequence. 

b Distance between TRS and the start of the corresponding gene. 

proteins; however, at the nucleotide level, this ORF was strongly 
conserved among group III coronaviruses including all IBV strains 
and all available sequences for turkey coronaviruses in the GenBank 
database. For several IBV strains, this region contained a well- 
identified ORFs including initiation (AUG) and stop codons (Fig. 3). 
Interestingly, a sequence, that might be the putative TRS (GUCAA- 
CAA) for this particular ORF was found 288 nucleotides upstream 
of the initiation codon for ORF-X within the M gene (Table 4); this 
putative TRS was conserved at the same relative location in virtually 
all IBV sequences in the GenBank database spanning the putative 
TRS region (Fig. 3). Further studies are warranted determine the sig¬ 
nificance of this apparently expressed protein found only in group 
III coronaviruses as far as is known (Fig. 3). 


3.7. 3'UTR 

A 3' UTR of 352 nucleotides was present immediately down¬ 
stream the N gene of the genome. It has been previously shown 
that the 3' UTR of both TCoV Indiana and Minnesota strains were 
502 nucleotides long, while TCoV-NC95 strain has a 3' UTR of 
349 nucleotides which lack the first 153 nucleotides at the 5' end 
(Breslin et al., 1999). Also, TCoV-MGlO, like IBV and some other 
coronaviruses, contained a conserved stem-loop structure in the 3' 
UTR as illustrated in Fig. 5. 

3.8. Transcription regulatory sequence (TRS) 

The TCoV genome contained putative TRSs located upstream 
of the start codon of each gene (Table 4). Although the distance 
between the putative TRS and the downstream initiation codon 
varied among the various genes in the TCoV genome, the partic¬ 
ular distance for each gene was similar in both IBV and TCoV. In 
all cases, the putative TRS contained a conserved AACAA motif. 
The leader TRS (CUUAACAA) was found in the 5' UTR at genomic 
positions 57-64 which is 465 nucleotides upstream of the ORFla 
initiation codon. The TRS for S gene, CUGAACAA, differed from the 
leader TRS by replacement of U at the 3rd position with G, and was 
located 53 nucleotides upstream from the S gene start. The M gene 
TRS was identical to the leader TRS and was located 77 nucleotides 
upstream of the M gene start. In the same manner, the N gene TRS 
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Fig. 4. Phylogenetic trees for TCoV and other coronaviruses. Aligned nucleotide sequences of complete coronavirus genomes representing coronavirus groups I, II, and III 
were utilized to construct maximum likelihood (ML) and maximum parsimony (MP) trees using the PAUP software package followed by bootstrap analysis using a heuristic 
search method. For each tree, the bootstrap support for each branch is indicated for each branch and the horizontal lengths of branches are proportional to the amount of 
hypothesized evolutionary change. All trees are rooted using the group I coronaviruses as a functional outgroup. The In likelihood for the ML tree and the consistency index 
(Cl) and retention index (RI) are provided for the MP trees. For ingroup analyses of relationships among the group III coronaviruses, aligned amino acid sequences for the spike 
glycoprotein (S), envelope (E) and nucleocapsid (N) proteins were analyzed using maximum parsimony (branch and bound search algorithm). HCoV, human coronavirus; 
FIPV, feline infectious peritonitis virus; BCoV, bovine coronavirus; MHV, murine hepatitis virus; IBV, infectious bronchitis virus; TCoV, turkey coronavirus. 
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Virus 

Acc. No. 

Position 

IBV-CAL99 

AY514485 

27,564 

IBV-M41 

AY851295 

27,346 

IBV-p65 

DQ001339.1 

27,483 

TCoV-MGlO 

EU095850 

27,497 

TCoV-UK/412/00 

AJ310642 

104 

TCoV-NC95 

AF111997 

1,445 

TCoV-Indiana 

AF111995 

1,598 

TCoV-Minn 

AF111996 

1,597 

IBV-Partridge 

AY646283 

27,335 

Pheasant CoV 

AJ619593 

85 

Duck CoV 

AJ871024 

240 

Goose CoV 

AJ871020 

1,720 

Pigeon CoV 

AJ871023 

1,568 

SARS CoV 

DQ898174 

29,586 

Asian Leopard CoV 

EF584908 

12,605 


3’-UTR stem loop structure 

AGTGC C GAGGC CAC GC GGAGTAC GATC GAGGGTACAGCACT 
AGTGC C GGGGC CAC GC GGAGTAC GAC C GAGGGTACAGCACT 
AGTGC C GGGGC CAC GC GGAGTAC GATC GAGGGTACAGCACT 
AGTGCC GGGGC CAC GCGGAGTACGATCGAGGGTACAGCACA 
AGTGCC GGGGC CAC GC GGAGTAC GATCGAGGGTACAGCACA 
AGTGC C GGGGC CAC GC GGAGTAC GATC GAGGGTACAGCACA 
AGTGC C GGGGC CAC GC GGC GTAC GATC GTGGGTACAGCACT 
AGTGC C GAGGC CAC GC GGAGTAC GATC GAGGGTACAGCACT 
AGTGC C GGGGC CAC GC GGAGTAC GATC GAGGGTACAGCACT 
AGTGC C GAGGC CAC GC GGC GTAC GATC GAGGGTACAGCACT 
AGTGCCGGGGC CAC GC GGAGTAC GATCGAGGGTACAGCACT 
TGTGC C GGGGC CAC GC GGAGTAC GATC GAGGGTACAGCACA 
AATGCCGAGGCCACGCGGAGTACGATCGAGGGTACAGCATT 
TTCATCGAGGC CAC GC GGAGTAC GATCGAGGGTACAGTGAA 
TATGCC GAGGC CAC GC GGAGTAC GATC GAGGGTACAGCATA 


Fig. 5. Alignment of highly conserved stem-loop (s2m) sequences present in the 3' UTR of a variety of coronaviruses. This highly conserved region, flanked by highly divergent 
sequence (not shown), is found in TCoV, IBV and other avian coronaviruses. The s2m sequence is also found in the atypical group III coronavirus SARS-CoV and the recently 
sequenced coronavirus isolated from an Asian Leopard cat. 


was identical to the leader TRS and located 93 nuclotides from the 
N gene start. 

3.9. Phylogenetic analyses and classification ofTCoV 

Phylogenetic reconstruction of the whole genomes of 12 coro¬ 
naviruses using maximum likelihood and maximum parsimony 
produced well-supported trees (Fig. 4) that placed TCoV-MGlO as 
a sister taxon to the four IBV isolates included in the analysis. Boot¬ 
strap support for both analyses was 100% at each node indicating 
significant support for the branching order. Group I coronaviruses 
(FIPV, HCoV-NL63, HCoV-229E) formed a monophyletic clade as did 
the group II coronaviruses (SARS-CoV, MF1V, BCoV, HCoV-OC43). 
SARS-CoV was basal to the other group II coronaviruses in the whole 
genome analyses. The solid support for the monophyly of TCoV- 
MG10 with four IBV isolates supports the conclusion that this TCoV 
should be classified as a member of the group III coronaviruses. 

Analyses of aligned amino acids sequences for the E, N, and S 
proteins of various coronaviruses produced phylogenetic trees that 
largely reflected the nucleic acid-based whole genome trees (Fig. 4). 
The N protein produced a well-supported tree that had a largely 
unresolved polychotomy consisting of five TCoV isolates (three USA 
isolates, one UK isolate, and isolate MG10 from Canada) and the four 
IBV isolates used in the whole genome analyses. TCoV-MGlO was 
most closely related to TCoV-NC95 and these two isolates formed a 
well-supported monophyletic group based on the N protein; how¬ 
ever, this protein did not contain sufficient information to infer 
relationships reliably among group III coronavirus. Similar results 
were obtained with a large number of IBV sequences and five TCoV 
sequences with the M protein (data not shown). The E protein anal¬ 
ysis placed TCoV-MGlO as the sister taxon to the three IBV isolates 
used in the analysis; however, there was only a single TCoV available 
for that analysis so no conclusions was drawn regarding the utility 
of this protein for inferring evolutionary relationships among these 
group III coronaviruses. Unlike the analyses based on aligned S or N 
protein amino acid sequences, group II coronaviruses did not form 
a monophyletic group in the analysis based on aligned E protein 
sequences; SARS-CoV did not group with the other three group II 
coronaviruses included in this analysis. 

Using available sequences for the S protein, three TCoV isolates 
(Gh, G1 and MG10) formed a monophyletic group closely related 
to a quail coronavirus (QCoV) isolate from Italy. Together, these 


isolates formed the sister group to a number of IBV isolates. The 
phylogenetic reconstruction based on the S protein, unlike other 
analyses based on the amino acid sequences of the N, E or M pro¬ 
teins (data not shown), produced a well-supported clade containing 
only TCoV strains and suggests that the S protein may be a more 
useful molecule for inferring relationships among the group III 
coronaviruses. 

4. Discussion 

The family Coronaviridae is included in the order Nidovirales 
along with the Arteriviridae and Roniviridae families. Coron¬ 
aviruses are divided into three groups (I, II, and III) based on the 
genome structure and organization (Holmes and Lai, 1996; Lai 
and Cavanagh, 1997). Group I coronaviruses include porcine epi¬ 
demic diarrhea virus (PEDV), TGEV, canine coronavirus (CCoV), 
feline infectious peritonitis virus (FIPV), HCoV-229E, and a newly 
identified HCoV-NL63, whereas group II includes murine hepati¬ 
tis virus (MHV), BCoV, HCoV-OC43, rat sialodacryoadenyleitis virus 
(SADV), canine respiratory coronavirus (CRCoV), and equine coron¬ 
avirus (ECoV). Group III coronaviruses are IBV as well as the newly 
discovered pheasant coronavirus. Group III coronaviruses are char¬ 
acterized by modification of Gene 3 to a tricistronic structure that 
codes for genes 3a and 3b as well as the E gene, and the insertion 
of an additional unique sequence designated Gene 5. The E gene 
is common in all coronaviruses but is incorporated into the tri¬ 
cistronic Gene 3 located between S and M genes only in the group 
III coronaviruses as far as is known. In contrast, Gene 5 located 
between the M and N genes had only been reported previously 
from IBV. In TCoV-MGlO, Gene 5 has two small ORFs (ORF5a and 
ORF5b) that code for products of 65 and 82 amino acids, respec¬ 
tively; this is identical in length to the Gene 5 products of IBV. The 
presence of Gene 5 has been suggested as a genetic maker for group 
III coronaviruses (Cavanagh et al., 2001 ). In vitro work with IBV has 
demonstrated that Gene 5 is not essential for virus replication in cell 
culture (Casais et al., 2005). Whether or not Gene 5 is unnecessary 
for in vitro replication ofTCoV is unknown. 

4.1. TCoV is a group III coronavirus 

Since first identified in 1951 (Peterson and Hymass, 1951) and 
despite the economic importance to the turkey industry, TCoV has 
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remained one of the least characterized among all known coron- 
aviruses. Early studies suggested that this virus could be a group 
II coronavirus along with BCoV and HCoV-OC43 based on serol¬ 
ogy and partial sequences for N and M genes. (Verbeek and Tijssen, 
1991) reported that the TCoV N gene was 100% identical to BCoV 
N gene. This suggestion was supported by a study using the Min¬ 
nesota strain of TCoV that demonstrated hemagglutination activity 
of TCoV for rabbit erythrocytes (Dea et al., 1990). A HE gene is found 
in many group II coronaviruses. 

Despite these early confounding observations, (Guy et al., 1997), 
using antibodies to discriminate among coronaviruses, suggested 
that TCoV was more closely related to IBV than BCoV (Dea and 
Tijssen, 1989; Dea et al., 1990), which is in agreement with our 
genomic data. Our completion of the full genome sequence of a 
field isolate of a TCoV shows clearly that the TCoV genome structure 
and sequence are much closer to IBV than any other coronavirus. 
Like IBV, the genome structure of TCoV-MGlO is: 5' UTR - repli- 
case (ORFla and ORFlb) - spike (S) protein - ORF3 (ORFs 3a and 
3b) - small envelop (E or 3c) protein - membrane (M) protein 
- ORF-X - ORF5 (ORFs 5a, and 5b) - nucleocapsid (N) protein - 
3' UTR-poly(A), in order (Fig. 1). The non-structural protein gene 
immediately downstream of ORFlb and the further downstream 
HE gene that are both commonly found in group II coronaviruses 
were entirely absent in TCoV. Our data did not support the pres¬ 
ence of any gene similar to HE and we concluded that TCoV does 
not contain the HE gene. As is obvious from Table 3, TCoV-MGlO 
and other TCoV isolates demonstrate much higher sequence simi¬ 
larity to strains of IBV that they do for any other coronavirus. For 
example, the N gene of TCoV was only 44% identical to BCoV and 
HCoV-OC43 but had 92% identity to the N gene of IBV. This appar¬ 
ent discrepancy with previous sequencing results (e.g. Verbeek and 
Tijssen, 1991) may be explained by several reasons. One possibil¬ 
ity is genome recombination between TCoV and BCoV. It is also 
possible that a cell line commonly used for coronavirus cultiva¬ 
tion, HRT-18, may harbor a latent infection with one of the human 
coronaviruses which was then activated upon infection with 
another coronavirus. Laboratory contamination of BCoV is also a 
possibility. 

Our data shows a high degree of sequence identity between IBV 
and TCoV in the replicase, E, M, and N genes with greater than 90% 
sequence identity for each (Table 3). The S gene is most variable 
among IBV strains and between IBV and TCoV, perhaps reflecting 
the role that the S protein has in determining receptor binding in 
coronaviruses (Cavanagh, 2005). Despite the relatively large genetic 
variation in the S gene of various IBV strains, TCoV also showed a 
relatively higher sequence identity to IBV (up to 57%) while the 
sequence identity to the S gene of BCoV was only 45%. In contrast 
to IBV, our limited study indicates that TCoV S is relatively con¬ 
served among isolates, suggesting that the TCoV S genes might 
be less varied than the IBV S genes. IBV and TCoV have distinct 
clinical presentations in infected hosts. IBV causes respiratory dis¬ 
ease in chickens whereas TCoV causes enteric disease in young 
turkeys. Perhaps the tropisms exhibited by IBV and TCoV reflect 
the sequence variation of the S glycoprotein and the resulting dif¬ 
ferences in receptor affinities. The TCoV S gene is larger than that 
of most IBV isolates. A study is required to understand the func¬ 
tional difference in S for IBV and TCoV, which may have a great 
impact on the antigenic properties and tissue tropism of both IBV 
and TCoV as well as the development of control measures against 
them. 

The presence of Gene 3 and Gene 5 is a unique characteristic for 
group III coronaviruses as those genes do not exist in mammalian 
coronaviruses. As those genes are highly conserved in avian coro¬ 
naviruses, they might serve as a cis acting elements essential for 
virus replication. 


Based on the overall structure of the genome and sequence simi¬ 
larities to IBV, we conclude that TCoV should be classified as a group 
III coronavirus. 

4.2. A newly recognized ORF characteristic of group III 
coronaviruses 

Our sequence data revealed the presence of a novel ORF in TCoV 
located upstream of Gene 5 and downstream of the M gene. This 
ORF is unique to TCoV and IBV, the only Group III coronaviruses 
for which sequence data in this region of the genome are avail¬ 
able. ORF-X was strongly conserved among group III coronaviruses 
including most IBV strains and all available sequences for turkey 
coronaviruses in GenBank as well as the TCoV-MGlO isolate. Eight 
(e.g. Beaudette and p65 strains) of 40 or more IBV sequences for 
ORF-X have a 49 base deletion in comparison with other IBV strains; 
all of the strains with this deletion were laboratory-adapted, cell 
cultured viruses originating with the Beaudette strain of IBV, sug¬ 
gesting that this deletion may be an artifact of cultivation outside 
of the natural host. The maintenance of this long sequence within 
a coronavirus genome seems highly unlikely if this was not a 
functional gene. Finding a highly conserved putative TRS, 288 bp 
upstream of the initiation codon for ORF-X within the M gene, rein¬ 
forces the notion that this gene is functional in both IBV and TCoV 
despite the unusually long distance between the putative TRS and 
the initiation codon. This region may be a good marker for group 
III coronaviruses. Further studies are being carried out to further 
characterize this particular ORF and determine its role, if any, in 
virus replication. Among coronaviruses, only SARS-CoV contains an 
ORF immediately upstream of the N gene, which is referred to as 
Gene 8. Gene 8 plays an important role in the SARS-CoV replication 
and induction of apoptosis of its host cells (Chen et al., 2007). The 
function of the small ORF found in TCoV remains unknown and the 
lack of any homology with any known protein makes inferring its 
function difficult. 

4.3. Regulatory features in the genome 

The 3' UTR is believed to be involved in genome replication of 
coronaviruses (Williams et al., 1993), despite its apparent ability to 
possess quite variable sequence and sequence lengths. This varia¬ 
tion within available 3' UTR sequences of TCoVs is the same as for 
IBV strains. The 3 r UTR of IBV strains (Beaudette, KB8523 and CU-T2) 
are 503-505 nucleotides in length in contrast to 320 nucleotides 
for IBV strain M41 (Boursnell et al., 1985). The 3 r UTR of TCoV- 
MG10 was highly conserved with a 94-98% nucleotide identity to 
most published IBV and TCoV 3' UTR sequences. In contrast, the 
identity to BCoV was only 45%. Some viruses such as IBV, human 
astrovirus, and turkey astrovirus were reported to contain a stem 
loop-like motif (s2m) in the 3' UTR (Jonassen et al., 1998). This motif 
appeared to be conserved among those different viruses, suggesting 
that it might have resulted by RNA recombination between differ¬ 
ent viruses (Monceyron et al., 1997). TCoV was found to contain 
the same motif in the 3' UTR (Fig. 5). The presence of the s2m 
motif in the 3' UTR may also suggest that IBV and TCoV share a 
common ancestor. Interestingly, SARS-CoV, but not other group II 
coronaviruses, shares the presence of this conserved s2M motif (see 
Fig. 5). Sequence analyses of the 3' UTR from 19 different IBV strains 
revealed the presence of two distinct regions: Region I was highly 
variable and located immediately downstream of the N gene while 
region II was highly conserved and located upstream of the poly(A) 
tail (Dalton et al., 2001). Gobel et al. (2007) reported the presence 
of the octamer motif (5 r -GGAAGAGC-3 r ) within the 3' UTR hyper¬ 
variable region that was highly conserved among all coronaviruses. 
TCoV-MGlO had two copies of the octamer motif in its 3' UTR; the 
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first copy between nucleotides 25,690 and 25,697, and the second 
copy between nucleotides 27,553 and 27,560. The role of this motif 
for coronavirus replication is unknown, but (Gobel et al., 2007) sug¬ 
gested that it might play a role in the coronavirus replication cycle. 

In summary, this is the first completion of the full-length TCoV 
genomic sequence. This study should lead to a better understand¬ 
ing of the molecular biology of TCoV and perhaps contribute to our 
understanding of poult enteritis mortality syndrome (PEMS) affect¬ 
ing young turkey flocks thought to result from co-infection of TCoV 
with turkey astrovirus. By completing the first genome of a TCoV, 
we have established the genome organization and coding strategy 
for the virus that unequivocally establishes that TCoV is a group III 
coronavirus, closely related to IBV and other avian coronaviruses. 
In addition, we have identified a putatively functional gene (ORF- 
X) shared among all sequenced IBV and TCoV strains that may be a 
shared feature of all group III coronaviruses. 
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